Aminet 21

home *** CD-ROM | disk | FTP | other *** search

/ Aminet 21 / Aminet 21 (1997)(GTI - Schatztruhe)[!][Oct 1997].iso / Aminet / util / libs / Tran43pch.lha / _Patch43 / update.man < prev next >

Wrap

Text File | 1997-07-03 | 26.7 KB | 675 lines

Update Manual translator.library - Version 43.1 Update from Version 42.4 3 July 1997 M. L. Barlow 1. Status. Version 42.4/43.1 of the translator library is not in the public domain. Source is available for Version 42.4. The library and accent files are freely distributable provided no profit is made from them. Accent files may have additional or separate restrictions placed on them by their authors. 2. Introduction. This version of the Translator Library was developed from the source code posted to the Aminet by Francesco Devitt. As such it remains largely his work. This version adds new accent file rules to facilitate number expression and fixes three problems I have encountered with the original Translator42. Version 43.1 adds the placement of the Narrator escape code sequence after the end of the translated text output string to achieve compatibility with those programs that give the whole translation buffer to the Narrator. The basic changes from version 42.4 to 43.0 are as follows: (Enhanced Syntax for Accent Files) 2.1 Added Empty Match Condition. Extra text may be inserted into the output string based on the text pattern defined by the prefix and suffix rules alone. This allows for the insertion of "thousand" or "hundred" in number strings with a single statement. An empty match is indicated by [¶] or [¶@] in the match string (¶ is ALT-P). Only one empty match is allowed at any text location. 2.2 Added Suffix Text Induction Feature. Suffix pattern matching text characters may now be pulled into the bracket delimited text-replacement string. Text may be pulled in ahead {& replace-text } or behind {! replace-text} the replacement text. Multiple characters may be pulled in by repeating the & or ! characters. The whole suffix match will be pulled in if {&* or {!* is specified. This last feature can be used to convert $45,701 to: {45,701 dollars} with rule: %class numeric 0 1 2 3 4 5 6 7 8 9 \. \, [$](numeric+) = {&* dollars}. 2.3 Added Zero or One Match Condition. It is now possible to specify a zero or one match condition. This allows the specification of optional prefixes or suffixes that only occur once. (Problems Solved) 2.4 Fixed Word Separator Problem. Translator42 does not recognize the same set of word separators as the original Translator37. This causes unusual pronunciation when punctuation marks or numbers are combined with text. In Translator43 the %Separator statement of Translator42 has been replaced by an %Alphabet statement. All characters not in this Alphabet Set are treated as word-separators. The default alphabet does include the ISO-8859-1 international characters. 2.5 Fixed Buffer Overrun Problem. Translator42 may crash your system if the buffer that is provided by the program using this library is not large enough to handle the resulting text. The translator is supposed to stop short if this happens and report how much text it did translate. Translator37 does this. However, Translator42 can fail to notice the end of the buffer and continue on writing into unauthorized memory space. This problem is now fixed. 2.6 Fixed The False, In-line Text Command Problem. Translator42 allows accent and scope changes to be made by in-line text commands delimited by simple braces and backslashes. This can be a problem when reading general text that may contain these characters or ASCII art. Translator42 does allow this feature to be turned off using the Translator42 preference tool to delete these characters in the boxes provided. It should not be necessary to do this with translator43. Translator43 reduces the severity of this problem by requiring that a rubout character, 7F hex, precede each in-line text command in the text being read. 2.7 Added Assembly Code Modules. Several simple repetitive routines, including the built-in unsigned byte strchr() routine, have been replaced by hand optimized assembly modules for increased processing speed. (Problem Avoided) 2.8 Dropped External Language Reference Rule Capability. This feature only works in Translator42 if the user does not disable or change the definitions of the in-line language or scope changing rules. This rarely used feature was dropped due to the performance impact and complexity of the filter that would be required to prevent text induction of false scope codes. 3. Requirements. A complete installation of Translator42.4 or Translator43.0 is required. See the section 3 of Translator.man supplied with Translator42 for that installation procedure, if required. Upgrading to 43.0 or downgrading to 42.4 is not required for this patch. 4. Installation. 4.1. (Optional Precaution) Back-up your SYS: partition, or Libs: directory. 4.2. Install Translator42.4 if this has not been done. This step is _NOT_ required if you have already upgraded to Translator43.0. This patch only works on fully installed versions of Translator42.4 or 43.0. 4.3. Unpack the Tran43pch archive to a convenient directory. 4.4 Stop and close out (exit/quit) all programs that might be using the translator.library. If possible, don't start any such programs before the installation. 4.5. Run the Installer Script by clicking on the Install icon in the unpacking directory. This patches the installed sub-type (v.33, v37, or 020) of Translator42.4 or Translator43.0 to the equivalent Translator43.1 sub-type. The previous translator.library will be renamed to translator42.4xlibrary or translator43.0xlibrary and a new translator.library will be patched in. If you are upgrading from 42.4 and have the Italiano.accent in your Locale:accents directory, this accent will be renamed to Italiano42.Xaccent and a new Italiano.accent will also be patched in. This new accent has only one line changed for compatibility with Translator43. If you elect to create a log file, this file will be created in the unpack directory you have selected. 4.6. If the old translator.library was resident in ram: do to prior use, it may be necessary to use "avail flush" or "flushlib translator.library remove" or reboot the system before the upgrade takes effect. 4.7. (Optional) copy the Update.man file to a directory of your choice for future reference. 4.8. (Optional) Unpack the Specialized Translator43 Accent files and copy them to Locale:accents. 5. New Accent Files. The new accent files will be uploaded independently. I am using an "Ax_(n)" format prefix on the archive name to group them together and assure that version information is not lost if the names are truncated to 8.3 format. Use a directory utility to copy these demo accent files to your Locale:Accents directory if you wish. Most of these accent files are experimental, as I only speak USA-English. I have chosen city names to indicate the experimental nature of these accents. 5.1 Berlin.Accent. (Ax_1Berlin.lha) An experimental German accent demonstrating some of the new features of this version and some special phoneme combinations to overcome the lack of the proper German CH sounds in Narrator 37.7. The name "Berlin" is chosen to distinguish it from the authentic deutsch.accent developed by native German speakers. It was developed from the deutsch.accent, version 0.1 By Stefan Zeiger and the rules stated in the Pronunciation chapter, pp 265-267, of "Der Anfang (Understanding and Using German)" by Harold von Hofe, 1958. This accent is optimized for use with Narrator 37.7 and requires Translator43. 5.2 Chaucer.Accent. (Ax_0Chaucer.lha) An experimental generalized Middle English accent. English before the "Great Vowel Shift" with trilled Rs and guttural gh sound. Based on the brief description of Middle English in "A History of the English Language" by Albert C. Baugh. This accent has been optimized with Narrator 37.7 and requires Translator43. 5.3 Paris.Accent. (Ax_0Paris.lha) A rather crude experimental stop-gap French accent developed from several English guides on French pronunciation. Optimized with Narrator 37.7. Requires Translator43. 5.4 !USA.Accent. (Ax_1USA.lha) This is an extensive USA American accent developed new from scratch. It is about 12 times the size of the standard "American accent." The goal is to approximate the USA Broadcast Standard Accent. The following features are included: a. Arabic and Roman numeral conversion b. Silent 'e' detection in many compound words c. British spelling recognition d. Resolution strategies for some common homographs (lead, live, read, wind), words with the same spelling but different pronunciations. e Recognition of many place and personal names f. Conversational, non-formal pronunciation The large size of this accent allows a much higher accuracy than the standard American accent, however 68000 based systems may experience a 2 second delay per 80 character line of text and a 30 to 60 second initial loading delay while the accent is compiled at first use. The speech is quite snappy on a 68040 based system at my recommended 210 word per minute speaking rate setting. The basic reason for the large size of this file is that the spelling of English words has become more a matter of tradition than phonetics. The traditional spellings of most basic English words were established in the Middle English period based on a Latin model. Most of the changes in English pronunciation since that time are not reflected in the spelling. These changes include final e silencing, loss of the guttural gh sound, and the Great (Long-)Vowel Shift. Words imported or "borrowed" from other languages tend to retain their native spellings. The rules for spelling and pronouncing imported words from classical Latin and Greek have remained essentially the same. (This is why "machine" does not rhyme with "shine".) Through these and other processes, the pronunciation rules for English words have become quite complicated. 6. Translator Preferences Program and Other Utilities. See the Translator42 Translator.man for information on the Translator42 utilities. For Translator43, the example how a new accent may be selected by in-line text should be modified to read as follows: \english{ Hello. Beastly hot weather this! Yes, hello. My name is {\maori Hone Ropata} and I am \maori{Maori.}} The character is the "rub-out" character, ASCII 127 decimal, that was added in Translator43 as an additional qualifier to reduce problems reading general text that may contain ASCII art. Do NOT try to insert this "rub-out" character in the preference boxes. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * Note: The information in the following section is primarily for * * those who wish to create or modify accent files. Other users may * * ignore this section. * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * 7. Accent file Format. For the most part, accent files for Translator42 will be compatible with Translator43. The Italiano.accent must be changed because it has an embedded reference to the English accent that is no longer supported. A patch for a modified Italiano.accent is included. The only possible problems would be with files that use the defunct %separator directive, used [¶ in the match string or that used {! or {& in the phoneme string. See translator.man for a complete description of the basic format. Each line of the file may be one of the following types: 1. Blank lines are ignored. 2. Comment lines beginning with `#' are ignored. 3. Directives begin with `%'. 4. All other lines are pronunciation rules. 7.1. Directives. Directives in an accent file are introduced by a percent character (%) followed by the name of the directive and its arguments. Below is a summary of the directives in Translator43. The Translator42 manual, translator.man, is referenced where no change applies. 7.1.1. Directive: stress. Syntax: %stress <N> Description: See Translator.man. 7.1.2. Directive: emphasis. Syntax: %emphasis <N> Description: See Translator.man. 7.1.3. Directive: class Syntax: %class <member> [ <member> ... ] Description: See Translator.man. 7.1.4. Directive: complain Syntax: %complain <level> Description: See Translator.man. 7.1.5 Directive: alphabet (new with version 43) Syntax: %alphabet <character list> Description: This command can be used to define the *characters* that constitute words. The default is approximately equivalent to the following: %alphabet aáàâãä bcd ð éèêë fgh iíìîï jklmnñ oóòôõö pqrs ß t þ uúùûü vwxyz Note that UNLIKE the class entries, only single characters are accepted, spaces are totally ignored as delimiters. These characters are used to build a random access lookup table that defines the alphabet status of each character. Each character in the list causes its table entry to be set to one. 7.1.6 Directive: separator (obsolete, not recognized) 7.2 New Context Rules. See 7.2 of translator.man for the basic context rules. 7.2.1 Background: General Sequence of Operation. The client program that calls the Translator Library provides a pointer to the text string to be translated, its length in bytes, a pointer to a buffer to hold the translated output, and its length in bytes. The input source string is copied to a new reference buffer, delimited on each end with nulls, and converted to upper-case. Then the source reference buffer is translated by a progressive, single pass process that searches character by character for applicable rules. Rules consist of: (1.) an optional prefix requirement string, (2.) a mandatory [match] string, (3.) an optional suffix requirement string, and (4) a mandatory = followed by phoneme replacement string, or {text replacement string}, or an empty space. Examples: LAB[OURA]TORY = RAH | OURA is converted to phonemes RAH. $T[A]K(vowel)$ = EY4 | A is converted to EY4. | $["WW\ II"]$ = { world war two } | WW 2 will be converted to string | "world war two" and then, that string | may be translated to something like | "WER4LD WAA4R TUW4" independently. | $MAK[E]$ = | Silent E. silenced. Each rule's [match] string is compared with the text in the reference buffer; then the prefix and the suffix requirements are tested. If the prefix and suffix requirements are satisfied, then the replacement rule is applied. This rule either provides output phoneme text directly (the normal case) or provides a replacement string that is decoded in isolation to create the phonemes for the matched text. The process then continues on the source reference buffer at the next unmatched character. When the end of the source reference buffer is reached, a normal null return is executed. If the end of the output buffer is reached first, the output buffer is closed off with a null at the end of the last fully translated word and the routine returns a negative number representing the number of fully translated characters (if less than -8). 7.2.2 Modified: Pattern Codes. The left and right contexts are strings which may contain pattern codes. These include: (<class>) Must match one member of a class (<class>+) Must match one or more members of a class (<class>;) Must match zero or only one member of a class <new> (<class>*) Must match zero or more members of a class (<class>~) Must not be a member of a class @ Must be and alphabet character <new> $ Must be a non-alphabet word separator <modified> 7.2.3 New: Empty Match. The match string may now contain the empty match indicator ¶ (ALT P). If the prefix and suffix condition match, then the specified phonemes or text characters are inserted at that point. An empty match does not, BY ITSELF, advance the translation pointer on the input reference buffer. Once an empty match has been found and executed; it, and all proceeding empty matches are disabled until the current reference buffer character has been processed. Empty matches may be hidden by a proceeding normal match, but they will not hide succeeding empty or normal matches at the same character position on the input reference buffer or current source string. 7.2.4 Background: Search Lists. As in Translator42, rules are placed in one of 27 lists, depending on the first character of the match string. Thus, when we encounter a letter `A' in the source text to be translated, we save time by only looking through the list of rules with match strings beginning with the letter `A'. The relative order of the rules in each list is the same as that of the whole accent file. 7.2.5 New: Rule List Cross-Posting. The rules for the empty match strings go in list zero for non-alphabetic character rules. On an empty match condition, the current source text character is equivalent to the first character of the rule's suffix pattern rather then the first character of the match string. Thus, if this character corresponds to one of the other 26 lists, the rule would not be found. To enable empty matches in these other lists, the syntax [¶@] (ALT-P Shift-2) has been added to cause an empty match rule to be cross-posted to all lists. Each cross-post stub references the entry in list zero. Empty match cross-posting is not automatic because the predominant usage of this function is with numbers, where it is not required. Cross-posting is only required if the first character of the required suffix pattern may be alphabetic. 7.2.6 NEW: Suffix Text Induction. Translator43 allows the induction of text following matched text into the text replacement string on text replacement rules. Text induction modes indicated are indicated by the first character after the leading brace. In these modes, a temporary string is created that combines the bracketed text from the rules replacement text string with a delimited number of suffix characters. Leading or text swap induction is indicated by {& ...} and trailing induction is indicated {!...}. For example, the rule: [2](digit)(digit~) = {& and twenty} will create a replacement string "4 and twenty" for 24 and the rule: [2](digit)(digit~) = {!twenty } will create a replacement string "twenty 4" for 24. This feature should only be used where the suffix pattern rule guarantees the nature of the induced characters. Text induction advances the current source pointer for each character induced even with an empty match. 7.2.7 New: Text Induction Syntax. The number of characters to be induced may be specified by repeating the text induction indicator. For example, the rules: #short number indicator %class Ñ 0 1 2 3 4 5 6 7 8 9 #short number or number and comma separator %class Ç 0\, 1\, 2\, 3\, 4\, 5\, 6\, 7\, 8\, 9\, 0 1 2 3 4 5 6 7 8 9 [¶](Ñ)(Ñ)(Ç) (Ñ)(Ñ)(Ñ) (Ñ~) = {&&& thousand} will create a replacement string "375 thousand" from 375699. It is also possible to induct the whole matching suffix by placing an `*' after the text induction specifier as in the following example: #general numeric class %class numeric \, \. 0 1 2 3 4 5 6 7 8 9 [$](numeric+) = {&* dollars} where $1,235.23 would create a replacement string "1,235.23 dollars". Note that space, quotes, ¶ (ALT P), ), (, ], [,and \ are the only characters that must be escaped in a match string if used as literal characters. 7.2.8 CAUTION: Recursion Hazard. Accent file programmers should be aware that there is an increased risk of rule recursion in text induction rules, especially with empty pattern match and whole suffix induction rules. These text induction rules must be written to prevent the application of that SAME rule to its OWN replacement string. If the whole suffix is inducted on an empty match [¶], then there MUST be a prefix pattern requirement that CAN NOT be met by the text in the new replacement string. Examples: Bad Rule -- $[¶](numeric+)$ = {!* number } This rule would cause the creation of a new string containing the text " number " and the class numeric text. The text "number" would be converted to something like "NAH4MBER" in the output buffer and respond to the class numeric text string by spawning an additional "number"-numeric string just like the previous string. This recursive spawning would continue until the maximum recursion limit is reached. Translator42/43 allow replacements nested 64 deep. If this limit is exceeded, the program aborts the current line of text. Better Rule -- %class numeric 0 1 2 3 4 5 6 7 8 9 \. \, %class numberdone "number " 0 1 2 3 4 5 6 7 8 9 \. \, $(numberdone~)[¶](numeric+)$ = {!* number } The prefix requirement, (numberdone~), will prevent the recursive application of the rule. To be fully effective, the class numberdone or the prefix requirement should include provisions to anticipate the effects of all your other replacement rules on the numeric sequence. Good Rule -- %class num 0 1 2 3 4 5 6 7 8 9 %class tmark \, (num~)(num;)(num;)(num)(tmark;)[000](num~)={ thousand } (num~)(num;)(num;)(num)(tmark;)[00](num)(num~)={!* thousand and } (num~)(num;)(num;)(num)(tmark;)[0](num)(num)(num~)={!* thousand and } (num~)(num;)(num;)(num)(tmark;)[¶](num)(num)(num)(num~)={!* thousand } Note that this rule would convert a number string like [... 65,321 ...] to [... 65,{ thousand 321 } ...]. ^ ^ Recursion does not occur in this case because the required prefix does not exist in the new string { thousand 321 }. This example assumes that the "65," has been processed. 7.2.9 Eliminated: Language Changing Directives. Language changing in-line directives within the braces are no longer supported in Translator43. The previous example of this in the Translator42 version of the Italiano accent file: [computer] = {\english computer} must be changed to: [computer] = KUMPYUW3TAH to produce the same effect. This is the only known instance where this feature was used. Direct insertion of the required phonemes eliminates potential problems resulting from the user redefining or disabling these directives and removes the requirement that the other language be present. The translator preferences tool may be used to determine the phonemes to be copied from a foreign language by the accent file programmer. 7.3 Phonemes. The phonemes listed in the original manual are reproduced here for easy reference. 7.3.1 Narrator Considerations. The last versions of the Narrator device I have are version V33.2 (5 Mar 1986), file size 23280 bytes, issued with OS 1.3 and version V37.7. (22 May 1991), 65760 bytes, issued with OS 2.04. The narrator programs function as programmable voice simulators and are capable of a wide range of effects. Narrator 33.2 simulates three vocal tract resonances or formants, the minimum required for good intelligible speech. Narrator 37.7 provides 5 formants for a more natural sounding voice. Also the frequencies and amplitudes of the three primary formants may be adjusted to change the quality of the voice. The original developer, SoftVoice Inc, is the only entity with the legal right to distribute or authorize distribution of that software. 7.3.2 Non-English Phonemes. The basic phonemes provided by the narrator appear to be intended for English only. Narrator 37.7 appears to be more English specific than narrator 33.2 as it replaces /C phoneme with CH. However, as each phoneme is blended with its surrounding phonemes, it may be possible to create vowel or consonant clusters that provide better approximations for non-English sounds. This is most effective at rapid speaking rates. The missing /C phoneme may be approximated by KZH, KZHQ, or KZH/H with narrator 37.7. In this case, the unvoiced surrounding consonants silence the ZH and the ZH muffles the impulse of the K sound. 7.3.3 Phoneme List. The following is the list of "ARPAbet" phonemes used by the Narrator device. Vowels English IY bEEt, EAt IH bIt, In EH bEt, End AE bAt, Ad AA bArgain, tArget AH tUg, bUg, bUt, Up AO shORE, wAR UH bOOk, sOOt ER bIRd, EArly OH bOrder (sounds like the letter 'O' when used by itself) AX About (never stressed) IX solId (never stressed) Dipthongs EY bAY, AId AY bIde, I OY bOY, OIl AW bOUnd, OWl OW bOAt, OWn UW brEW, bOOlean, pOO, crEW (except that it is a dipthong) Consonants R Red RX Red (This is not mentioned in RKRM:Devs) W Wag M Men NX siNG S Soon F Fed Z haS, Zoo V Very CH CHeck /H Hole B But D Dog K Keg, Copy L Long LX Long (This is not mentioned in RKRM:Devs) Y Yellow N No SH SHy TH THin ZH pleaSure DH THen WH WHen J JuDGE /C supposedly loCH, or (german) baCH, but really like CHurCH Narrator version 37.7 pronounces this sound like the German tsch, English ch as stated above. KZH, /HZH; or KZH/H, /HZH/H when followed by vowels; sound closer to the mark. P Put T Toy (except before IY when it is pronounced D) G Guest Others DX piTY (tongue flap) Q kitt(Q)en (glottal stop) QX (Silent vowel - can lenghten the previous vowel) Contractions UL AXL IL IXL UM AXM (almost equal ) IM IXM UN AXN (almost equal ) IN IXN Symbols Digits 1-9 Syllabic stress . Sentence final character ? Question sentence final character - Phrase delimiter , Clause delimiter () Put parentheses about noun phrases ## End of speech (undocumented) Translator ` Do not add stress marks to this word # Word boundard for the purposes of adding stress marks